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ABSTRACT 

The Fourier transform of cosmological density perturbations can be repre- 
sented in terms of amplitudes and phases for each Fourier mode. We investigate 
the phase evolution of these modes using a mixture of analytical and numerical 
techniques. Using a toy model of one-dimensional perturbations evolving under 
the Zel'dovich approximation as an initial motivation, we develop a statistic 
that quantifies the information content of the distribution of phases. Using 
numerical simulations beginning with more realistic Gaussian random-phase 
initial conditions, we show that the information content of the phases grows 
from zero in the initial conditions, first slowly and then rapidly when struc- 
tures become non-linear. This growth of phase information can be expressed 
in terms of an effective entropy: Gaussian initial conditions are a maximum 
entropy realisation of the initial power-spectrum, gravitational evolution de- 
creases the phase entropy. We show that our definition of phase entropy results 
in a statistic that explicitly quantifies the information stored in the phases of 
density perturbations (rather than their amplitudes) and that this statistic 
displays interesting scaling behaviour for self-similar initial conditions. 
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1 INTRODUCTION 

Observations of large-scale structure in the spatial distribution of galaxies pose some of the most interesting challenges 
facing modern cosmological theory. Prominent among the issues raised by the confrontation is the question of how 
to best to use the information contained in maps of galaxy positions to constrain theoretical models. 

Traditional methods for the analysis of such maps involves a Fourier decomposition, treating the cosmological 
density contrast as a superposition of plane waves with (complex) amplitudes 



5(x) = 



P(x) - po 

PQ 



= ^ Ik exp(ik ■ 



(1) 



where po is the mean matter density. We discuss this kind of spectral representation in more detail in Section 4. The 
statistical analysis of galaxy clustering generally proceeds via the properties of the amplitudes 5k ■ A particularly useful 
statistical quantity in this respect is the power spectrum, essentially proportional to |(5fep, which is a second-order 
statistic in Fourier space. Higher order quantities based on 5k can also be found, such as the bispectrum (Peebles 



Figure 1 . A simple demonstration of the importance of phase information for pattern morphology. Panel (a) shows an example 
realisation of an N-body experiment with initial random phases. Panel (b) is obtained by taking the Fourier transform of panel 
(a), setting all the amplitudes to a constant {\&k\ = 1) then taking the inverse Fourier transform; it therefore retains only the 
phase information from (a). In panel (c), each mode keeps the same amplitude so the power-spectrum is unchanged but we 
redistribute the phases randomly among the modes. It is easy to see the striking resemblance between (a)and (b), the phase-only 
reconstruction, but (c) which has the same power-spectrum but random phases, is featureless. 
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1980; Matarrese, Verde & Heavens 1997) or through correlations of 5l (e.g. Stirling & Peacock 1996). However, an 
alternative approach exists that has not been explored greatly in the cosmological literature, and that is to look 
explicitly not at the amplitudes (the moduli of 6k) but their phases (j)k where 

5k = |5fc|exp(i^fc). (2) 

The importance of phase information can bo understood by considering a simple example. Gaussian white noise 
consists of a superposition of Fourier modes with random phases and whose amplitudes are drawn from a distribution 
which is independent of wavenumber, i.e. a flat spectrum. Now imagine a density field represented as a single Dirac 
<5-function in a periodic volume. The Fourier transform of this density field is constant in amplitude, so that its power 
spectrum corresponds exactly to the white noise spectrum. What differs drastically in these two cases is the difference 
in phases. In the former, the phases are random, in the latter, the phase of each mode is a definite relationship to the 
other modes resulting in a real-space distribution that is strongly localised. This second distribution clearly represents 
a more ordered state than the first one, but their power spectra are identical. A more relevant example is shown in 
Figure 1. The left panel shows a grey-scale representation of the density field obtained from a two-dimensional N-body 
simulation. In the centre panel, the amplitudes of the Fourier components arc set to a constant value, but the phases 
are retained. The morphology of the density field is very similar to the original. The result of randomly reshuffling 
the phases but retaining the amplitudes is shown on the right; information about the shape of the original pattern 
is destroyed. One can conclude from these examples that key information about the shape of localised features in a 
spatial pattern resides in the distribution of phases, and that information should in principle be quantifiable. Further 
interesting general examples of the importance of phase information for the analysis of images (in the widest sense 
of the word) can be found in Oppenheim & Lim (1981). The question is, how to quantify the information content of 
the distribution of phases? 

The density distributions of interest in cosmology are, of course, more complicated than these toy examples. 

Moreover, since gravitational clustering is generally thought to be driven by the process of gravitational instability 
we need to understand not only the phase information in "snapshots" of the density field, but the way in which phase 
information evolves with time. 

The standard model for the formation of galaxies and large-scale structure involves the idea that these structures 
grew by the action of gravitational instability from small initial density perturbations present in the early Universe. 
In most popular variants of this model, particularly those involving cosmic inflation, the initial perturbations are 
Gaussian, meaning amongst other things that their probabilistic properties are completely specified by knowledge of 
second-order statistical quantity: the autocovariance function, or its Fourier transform the power spectrum discussed 
above. Gaussian random fields are useful not only because of this very economical prescription of their statistical 
properties, but also because many properties of Gaussian random density fields can be calculated analytically (e.g. 
Bardeen et al. 1986). One particularly interesting property of Gaussian random fields is that they possess Fourier 
modes whose real and imaginary parts are independently distributed or, in other words, that they have phases which 
are independently distributed and uniformly random on the interval [0, 27r]. Terms in the evolution equations for 
the Fourier modes that represent coupling between different modes are of second (or higher) order in 5 and these 
are neglected when first-order perturbation theory is considered. During the linear regime, therefore, Fourier modes 
evolve independently and the phases remain independent and uniformly random (Peebles 1980; Sahni & Coles 1995). 
In the later stages of evolution, however, modes begin to couple together. It is in this regime that information fiows 
into the set of Fourier phases. 

There are therefore two reasons for studying phase correlations. One is to understand exactly how the morphology 
of non-linear clustering pattern is driven by the growth of phase information. The other is to explore ways of testing 
the initial hypothesis of Gaussian initial conditions, using measurements of phase association to constrain the level 
of possible non-Gaussianity. 

Some papers on this subject refer to the appearance of phase information in terms of phase correlations. In 
a strict sense of the word correlation, which means association expressed through second-order quantities such as 
covarianccs, this is incorrect. As we show in Section 4, any statistically homogeneous field cannot display phase 
correlations. However, it is certainly true that phases are no longer independently distributed in this regime. 

Despite the clear importance of phases for the morphology of galaxy clustering, the relevant literature is relatively 
sparse, with most attention being focused on the evolution of individual phases away from their initial values (Ryden & 
Gramman 1991; Soda & Suto 1992; Jain & Bertschinger 1998). One notable exception is the work by Scherrer, Melott 
& Shandarin (1991) who suggested and explored a method of quantifying phase association that could be applied 
to real data. The sparsity of the existing literature on phases is at least partly due to the difficulty in developing 
statistical methods for quantifying phase information, a difficulty we attempt to address in this work. 

In this paper we discuss the phase evolution and the quantification of the information they represent. After 
presenting some relevant background in Section 2, we begin by exploring the evolution of phases using simplified ID 
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analytical models in Section 3. This study suggests a simple way of encoding phase information which we present in 
Section 5. With the aid of dynamical numerical simulations in Section 6, we present how the encoded information 
from phases evolves with different initial conditions and displays interesting behaviour for self-similar evolutions. In 
Section 7, we discuss the results and offer suggestions on how phase information might be extracted from real surveys. 



2 THEORETICAL BACKGROUND 
2.1 Linear Perturbation Theory 

For the purposes of this study we consider a spatially flat matter-dominated universe. If the mean free path of the 
particles is small, it is appropriate to treat the large-scale distribution of matter as a self-gravitating Newtonian fluid 
with zero pressure. In this case the evolution of the gravitating system can be described in a comoving frame by the 
following three equations: the continuity equation, 

^ + iVx[(l + <5)v = 0; (3) 
Ot a 

Euler equation, 

^^ + (v- Vx)v = -iVxP-Vx0; (4) 
and Poisson equation, 

Vx0 = 4TvGa^po5. (5) 

In these equations 5(x) is the density contrast defined by eq. (1); for a spatially flat and matter-dominated universe 
the mean density po — l/GivGt^. In the equations (^-|^, Vx denotes a derivative with respect to the comoving 
coordinates x, where x — r/a{t), and r are proper coordinates. The velocity field is v(x, = ax and (j){-x.,t) is the 
peculiar gravitational potential. The regime in which 5^1, permits one to linearise and ^ by keeping only 
terms of first order in the perturbed quantities. We thus obtain 



Solving this equation with the initial condition 5{'Si,ti) = 5o(x) and (S(x,fi) = at ti, when the perturbations start 
growing, we get 

5(x,t) = 5ob±(t), (7) 

with the growing mode, b+ oc t^^^ , and the decaying mode, b- cc t~^ . This solution describes the growth of pertur- 
bations at early times. In this regime the evolution of each individual Fourier mode, 5k, of 5 is decoupled from the 
other modes and its rate of growth is independent of k. 

The linear solution breaks down when 5 is compatible with unity or beyond, because terms of higher order than 
the linear terms become important. To probe the onset of non-linearity we therefore need to use more sophisticated 
approximation methods than first-order perturbation theory (see, e.g. Sahni & Coles 1995, for a thorough review). 



2.2 The Zel'dovich Approximation 

Zel'dovich (1970) proposed a particularly ingenious approximate scheme which can be used to extrapolate the evo- 
lution of density fluctuations well into the non-linear regime. In the eponymous Zel'dovich approximation (ZA), a 
particle initially placed at Lagrangian coordinate q finds itself after a time t at an Eulerian coordinate is x. The 
displacement it has experienced simply depends on the velocity the particle had when it was at its original position, 
through b(t)u(q), so that 

r(q,t) =a(t)x(q,t) =a(t)[q + fe(t)u(q)], (8) 

where a(t) is the expansion factor and b{t) is the growing mode 6+ in (^). According to this prescription, the motion 
of the medium is simply Newtonian inertial motion: each particle moves with its original velocity along a ballistic 
trajectory. [Note that the peculiar velocity according to the Zel'dovich approximation is ax = a(t)6(t)u(q).] For an 
irrotational flow, u(q) can be expressed as a gradient of some velocity potential — V'l>o(q). 

The evolutionary picture that the Zel'dovich approximation suggests is that particles set out on inertial trajec- 
tories with their initial velocities determined by the initial velocity potential 'l>o(q) and follow straight lines with 
constant velocities. When two such trajectories meet, a singularity forms and both their positions, being distinct in 
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Lagrangian space, are mapped onto the same Eulerian coordinate. Such a singularity is called a caustic and its for- 
mation is called shell-crossing. Because the Zel'dovich approximation does not deal properly (or, indeed, at all) with 
the self-gravity of regions formed when such caustics occur, the two particles entering the singularity do not coalesce 
in a bound structure, but their paths cross each other and form multi-stream flows. The approximation breaks down 
at this point, and the mapping from Lagrangian to Eulerian space can no longer be inverted. Nevertheless, before 
shell crossing, the Zel'dovich approximation provides an excellent approximation to the real evolution of the density 
contrast (e.g. Coles, Melott & Shandarin 1993). 

Before shell-crossing the evolution of 5 can be determined which can be obtained by requiring mass conservation 
between Lagrangian and Eulerian space 

Po d^q = p(x, t) d^x. (9) 
If the flow is assumed irrotational, the matrix dxi/dqj is a symmetric matrix and can be diagonalised. Therefore, 

*^ " (l-6(t)Ai)(l-6(t)A2)(l-6(t)A3) ' 

if — Ai, — A2, —A3 are the eigenvalues of dui{q)/dqj. Thus, the condition that a caustic forms is the same condition 
as that one eigenvalues of the matrix dxi/dqj should be equal to l/h{t). 

The Zel'dovich approximation breaks down entirely in the highly non-linear regime, but this is not too great a 
problem for the problem at hand. As other studies of phase evolution have shown (Ryden & Gramman 1991; Scherrer, 
Melott & Shandarin 1991; Soda & Suto 1992; Jain & Bertschinger 1998), the phases of Fourier modes <^fc begin to 
move very rapidly away from their original values when density fluctuations become very large. This poses a 
problem for the interpretation of phase information when each one wraps around many multiples of 2n. In fact, what 
happens in the non-linear regime is that phases move so quickly for short-wavelength modes that their displacements 
become very much larger than 27r. The 'observed' phases, measured modulo 27r, then appear random because of the 
heavy phase wrapping in the strongly non-lineax regime. This is is a difficult problem to deal with, so we will simply 
skirt around it and consider only what happens on scales just entering the non-linear regime. 



2.3 The Zel'dovich Approximation in One Dimension 

Although the Zel'dovich approximation (ZA) is a very useful tool for following the fully three-dimensional evolution 
of density perturbations, it is particularly noteworthy that the solution it provides in one dimension is exact, at least 
until the first shell-crossing. In one dimension, including the case of a single plane wave in 3D, the value of 5 calculated 
from the Poisson equation is the same as that found using the ZA. The ZA is therefore a particularly useful tool to 
explore the evolution of the density contrast in ID; every particle carries information about its position and velocity 
during the evolution. In one dimension, the ZA is simply 

x{q,t) = q-h{t)^^, (11) 
and the density contrast is 

until the first singularity develops, at which point 

dx 

Tq='' 
or 

\ / min 

The velocity potential field is closely related to gravitational potential field, but it is important to realise that this 
is only relevant to the initial velocities of particles and not to their final positions; density pealcs talce place, not 
at the minima of potential wells, at the positions where the change of curvature in the potential field is zero, i.e. 
d^^o/dq^ = with d'^c&o/dg* at their minima. 

The fact that the ZA is exact in one dimension until shell-crossing allows us to study this case with particular 
ease (cf. Soda & Suto 1992). What wc shall do in the next section is to study the evolution of phases in ID in order to 
gain at least a qualitative understanding of how non-linear evolution leads to an increase in the information content 
of the set of Fourier phases. These calculations represent, for the time being, toy models, but as we shall see they 
will lead to the suggestion of a very useful way of quantifying phase evolution in the more general case. 
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3 ONE-DIMENSIONAL ANALYTIC CALCULATIONS 

The Fourier transform of the density contrast 5{x) in one dimension can be expressed as 



1 

2^ 



5{x) exp{ikx) dx = Afcc" 



(15) 



where (fik is the phase of the fc-th mode, and Afe = |<5fe|, which is real, is its amplitude; c.f. equation (g). Here it is 
assumed that the universe is periodic in one dimension with a period of 27r, i.e. x{q + 2jV) = x{q). Typically, we can 
represent the velocity potential as the sum of sinusoidal functions, i.e. 



*o('7) = > AkCos{kq + ak); 



k 



(16) 



the assumption of periodicity requires that k here must be an integer. 

Getting an analytic solution for the Fourier modes of the density distribution evolved according to the Zel'dovich 
approximation is not as straightforward as one might think because the Fourier transform ( p^ is made from real-space 
Eulerian coordinate, which in turn represents a mapping from the Lagrangian coordinate. From ( ]l5| ) 



2tt 



dx 



ikx J 1 

e dx = — 
27r 



ikx J 1 



exp 



ik 



d$()(g) 



dq. 



(17) 



Obviously, the case of an arbitrary 5{x) comprising a number of different Fourier modes is going to be difficult 
to analyse, so we will begin by looking at simpler cases. 



3.1 Single Mode 

The simplest case we can consider is one in which the velocity potential is described by a velocity potential comprising 
a single sinusoidal function: 

$o(<j) = -cos(g + Q). (18) 
An analytical solution is available in this case 

1 r 

6k = — / dq exp{ik[q — b{t) sm{q + a)]} — exp(— ifca) Jjj [fc6(t)], (19) 
27r / 

where Jk[kb{t)] is the Bessel function of the fc-th order and the phase (pk = —ka (Shandarin & Zel'dovich 1989). 
When b(t) <^ 1, A^; oc [kb{t)]'' , indicating the amplitudes of low-frequency modes grow faster than those of high- 
frequency ones. As b{t) is approaching unity, the time when the first caustic forms, the amplitude for k = 1 slows 
down, compared with the extrapolation of the linear theory (in which the amplitudes grow with the same rate). The 
phases, on the other hand, remain the same until shell-crossing. This idealised case demonstrates that the collapse of 
a single plane wave involves the generation of higher-frequency modes with phases determined by the phase of the 
original mode. 



3.2 Double Mode 

The next case we consider is where the initial velocity potential is simply the superposition of two sinusoidal forms: 

•l'o(g) = - cos(fcig) - cos(fc2 -I- a). (20) 

Even in this very simple case, no analytical solution can be obtained. We begin with a particular choice of frequencies 
for the two starting modes, one being the fundamental mode fci = 1 and the other being double this frequency. In 
Figure 2, we show five different values of a for ki — 1 and k2 — 2: a — 0.3, 1.1, 2.0, 2.8, and 3.1. The first row 
shows the corresponding velocity potential. There are two wells in the potential field; consequently two density peaks 
form during the later stages of evolution, shown in the second row. The density contrast 5 is plotted against q when 
b{t) = 0.2 for all five a's. The choice of this b{t) value is arbitrary but to show density contrast at very late stage 
from which phase coupling in Fourier domain can be explored. According to equation (p^, different velocity potential 
fields have different times for the first shell-crossing, so 5 in Figure 2 shows different levels of density peaks. In order 
to elucidate the pattern of phases generated by the evolution of this system, we use a technique borrowed from the 
idea of a return map used in chaotic dynamics (e.g. May 1976). 

To analyse a chaotic time series X{t), sampled at discrete times t (which we assume to be represented by the 
integers), it is useful to plot Xt+i against Xt for each pair of consecutive values {t,t + 1); this is the return map. 
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Figure 2. Doublc-modc calculations with (A;i , ^2) = (1, 2). Five different values of a are drawn for one-dimensional ZA solutions 
with an initial velocity potential <!>() = — cos((jr) — cos(25 + a). There are two potential wells in <3>()> causing two density peaks 
to form. The second row is the density contrast S when 6(f) = 0.2. Both $0 and <5 are periodic between — tt and tt, and are 
shown as functions of the Lagrangian coordinate q. The third row is the corresponding return map, t/ik+i versus t/iky both of 
which are measured modulo 2it. The fourth and fifth rows are the phase (p^ and discrete phase gradient D^., respectively, both 
being drawn against k. Histograms of the phases 4>k show a uniform distribution, whereas in the return maps the coupling of 
consecutive phases is clear. The relationship between the formation of density contrast and the way phases couple each other 
is easy to see via D^. panels. A dominant density peak converges D^. on large k, whereas the shape of on small fc's depends 
on the morphology of the structure. If two compatible density peaJfs form at the same time, converges slowly towards large 
fe's. 



If Xt and Xt+i are independent then the result should be a scatter plot in which no pattern appears. A time series 
consisting of correlated noise, would show some correlation in its return map, whereas a chaotic series would exhibit 
some form of attractor. We adapt this idea to the case in hand, by considering the sequence of phases <j)k in the same 
way. For a 4>k sequence of a density distribution, 4>k are paired for all modes as points 4>k+i) in a return map so 
that one can hope to see some pattern emerging in the relationship between two successive phases. This idea can, in 
principle, be generalised to higher-order maps showing 4>k+2 against 4>k, etc. 
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Figure 3. {ki,k2) = (1,3) for double mode. All the notations follow those in Figure 2. Here three potential wells cause three 
density peaks. 5 is drawn when b(t) = 0.1. The return map is more complicated in this case, but plots show the same 
relationship between consecutive phases as that in Figure 2. A dominant peak causes Di^ to converge more quickly towards 
large k than two compatible ones. 



For the case of the single mode, the <t>k sequence decreases monotonically by fixed quantity a for each step in 
k. In the return map, the phases are mapped into points {{4>i,4>2) = {—a, —2a), {(j)2, (^3) = {—2a, —3a) . . .}, which 
forms in the return map a diagonal line, provided that a is not a rational multiple or fraction of tt. 

Before examining the possible patterns in the return maps, we show in the fourth row of Figure 2 the phase (j>k 
as a function of k. Even for the simplest case of two density peaks, the phases display a complicated behaviour among 
these five different values of a. In the return maps, however, it is clear to see the patterns for the corresponding 
density distributions. The diagonal lines in the first three panels arc from the mapping of consecutive phases of 
high-frequency modes, which resemble the single-mode case. The points oscillating with the diagonal lines are from 
phases of long-wavelength modes. When a approaches tt, the phases are mapped into regions, instead of one single 
line. 
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Figure 4. The evolution of discrete phase gradient for $o = — cos(q) — cos(2g + 2.0). The phase gradient Df^ is shown as a 
function of A; up to fe = 50. In the top left panel, phases from small fe's start coupling at the beginning of the evolution, which 
depends on the initial phase difference between the two sinusoids, whereas those of high-frequency modes, whose amplitudes at 
this stage are still small or zero, are rounding errors. As the density flucuation grows, fine details appear when higher harmonics 
are spawned by the collapsing waves with a definite phase relationship leading to sharp peaks that eventually become caustics. 



A similar effect is noticed in Figure 3. Again, five different potentials with different values of a are drawn for 
ki = 1, and k2 = 3. All the notations follow those in Figure 2. Three potential wells cause three density peaks when 
b{t) = 0.1. We see again in the 0fc panels phases seem randomly distributed as in Figure 2. In the return maps, phases 
are not mapped onto diagonal lines, but much more complicated patterns. As the patterns are the mapping from 4>k 
to (l>k+i, we devise a new quantity to look for how the consecutive phases are coupled 

Dk = 4>k+i — ^k, (21) 

which means that Dk is effectively a discrete phase gradient. As will become clear in the next few sections, a histogram 
from Dk can be used to quantify phase information. Plotting a histogram of the 4>k, however, is a useless diagnostic 
of phase correlations. Such a plot would involve the projection of the return map onto one or other of the axes. Even 
if <j>k and (j)k+i were linearly related, the histogram of phases would be uniform, as can be seen from Figure 2. Instead 
of taking Dk series into the return map, we plot the discrete phase gradient Dk against k in the fifth row both in 
Figure 2 and Figure 3. In Figure 2, the successive phases are coupled in such a way that the series of Dk\k=i,3,6... 
and £)fc|fe=2,4,6... twine together and oscillate embedded in a decay, and in Figure 3, it is £'fc|fc=i,4,7..., Dk\k=2,5,8... 
and Dk\k=3,6,9... that shows a definite relationship between phases. 

Since phases alone can recover the morphology of the structure, what is shown in Dk for (fei,fc2) = (1,2) case 
can be demonstrated and understood qualitatively by two Gaussian waves, or three for(fei,fc2) — (1,3), the reason 
being Gaussian function is easy to manipulate between real and Fourier domain. It turns out that the resulting Sk 
of mode fe is from vector superposition in the complex plane, i.e., S'j}^ + both being the fe-th mode of Fourier 
transform of two density peaks. The number of twining series was caused by the number of density peaks. When 
a = IT, two identical peaks located at qi and — gi form symmetrically to the origin in real domain, so phases of 
the fe-th mode is formed by superposition in the complex plane of two vectors with same amplitude, whose phases 
are kqi and —kqi, multiples of the fundamental mode and always symmetric to the a;-axis. This results in 0fc = 
or TV, so Dk flip between and tt. Bearing this picture in mind, we can understand qualitatively why, when a is 
small, the dominant sharp peak produces fast convergence of Dk on high-frequency modes, similar to the single-mode 
solution, whereas the twining part of Dk depends on the morphology of the distribution. The vector in the complex 
plane representing the dominant sharp peak has much larger amplitude than the other one. This fast convergence 
of Dk results from receding influence of vector superposition on the resulting 0fc from the vector which represents 
the less dominant density wave. Moreover, when a is tuned approaching tt, a dominant sharp peak is replaced by 
two compatible ones, whose locations are shifting towards symmetry to the origin in real domain, so two compatible 
vectors in the complex plane account for the slow convergence in Dk- Therefore, the degree of convergence in Dk 
depends on how asymmetrical are the representional vectors for the sinusoids are in the complex plane. 
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Figure 4 shows the evolution of the discrete phase gradient for the case of $0 = — cos(g) — cos(2g + 2.0). At 
the initial condition, only two modes have well-defined amplitudes and so only two phases really exist. However, 
rounding errors in the evaluation of the required integrals result in the assignment of a definite phase oven to those 
modes which have formally zero amplitude. These fictional phases are initially random when the evolution begins. 
The first structure appeared in Dk depends on the phase difference between the two sinusoids, in which Dk from low- 
frequency modes flip between and —a. Fine details of the structure appear when higher harmonics are spawned by 
the collapsing waves with a definite phase relationship leading to sharp peaks that eventually become caustics. Those 
spawned modes not only produce the twining characteristic in Dk, but also interfere with the phases of low- frequency 
modes. 



3.3 Comments 

We have analysed these simple cases of two interacting plane waves in an attempt to develop a quantitative under- 
standing of the kind of structure one can hope to see in the phases of non-linearly evolved density fields. 

The results obtained depend very sensitively on the initial phase difference. The reason for this is that different 
real-space processes become confused when viewed in the Fourier domain, particularly in the phase representation. 
For one thing, each initial sinusoid begins to collapse in the same manner as the single-mode solution described 
in §3.1. This spawns Fourier modes with a definite phase relationship to the initial mode. This process generates 
high-frequency modes which interfere and produce a pattern in the return map that depends quite sensitively on the 
symmetry of the situation, i.e. on the initial relative phase of the two modes. If higher-frequency modes like this had 
actually been present in the initial conditions they would be affected by the generation of higher-frequency modes 
by the collapse of long wavelength perturbations. Obviously, there is no point in continuing the discussion to the 
interaction between three, four and n-modes. 

But despite this complexity, what these examples showis that the relatively simple quantity Dk, defined by eq. 
(21), shows an interesting response to the non-linearity induced by mode coupling and which quantitatively encodes 
some of the structure seen in the return maps. In Section 5 we discuss how to exploit the information contained in 
this simple statistic in the analysis of more realistic distributions obtained from an arbitrary superposition of initial 
modes, rather than just two. We shall see that the clues obtained from the illustrative examples above suggest a new 
and potentially powerful method for the analysis of phcise information. 



4 PHASES FOR RANDOM PROCESSES 

The previous section dealt with toy examples involving initial perturbations that were periodic on a given interval. 

In cosmology, the initial density field will not be periodic but will be in the form of one realisation of a random 
process. There are some subtleties involved in defining amplitudes and phases for such perturbations and these are 
often glossed over in the cosmological literature, although they are well known in the literature pertaining to random 
processes (e.g. Priestley 1981), so we will take the opportunity here to outline these problems and their resolution 
fairly carefully. In particular, a realisation of a random process can formally be expressed neither by a Fourier series 
(which assumes periodicity) nor a Fourier integral (which requires the function to tend to zero at ±oo). This section 
is about the correct mathematical way of treating the spectral representation of such a function, following the ideas 
of Wiener (1930). 

To keep things as simple as possible and consistent with the previous section, consider a process 5(x) which is 
one realisation of a stochastic process on an infinite domain (i.e. one for which x can take any value on the real line). 
The generalisation of this to a three-dimensional field is entirely straightforward but the notation is more messy. Now 
consider the part of this realisation that lies in the range —L < x < +L. We can make a new function of d{x) that is 
periodic by defining a new function 6l{x) such that 



3l{x) 
5l{x + 2rnL) 



5{x) (-L <x< +L) 
Sl{x) (m = ±1,±2±3,...) 



(22) 



The function 5l is periodic with period 2L so let us express it as a Fourier series: 




(23) 



where 



kn = mr/L and A„ is given by 
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1 /•+^ 

A„ = — / SLix)exp{-iknx). (24) 
Thus, 



L 



S{x) ^ —r= ^ GLikn)exp{ik„x)Ak„ (25) 



27r 

n— — oc 



where GL(k) is defined, for any k, via 

1 1 

Ghik) = / 5L(a;) exp(— ifca;)d2: = — -= / 5{x) eyip{—ikx)dx (26) 



where Afcn = fc„+i — fcn = Tr/i. A spectral density function P(fc) can be defined by 

L — *oo Zl^ 

where the expectation value {•) is over the probability distribution. Note that \GL{kn)\ cx) as L oo but 
|GL(fc„)|A(fc„) ^ as L ^ oo. 

Now we define the function Shik) via 

hik) = -L / GL{q)dq, (28) 



/27r _ 
so that 

A5i,(fc„) = 5L(fc„+i) - 5i,(fc„) ~ ^GL(fe„)Afc„ (29) 
for small values of Afc„, i.e. for large L. Ifence, in this limit, 

{|A5i(fc„)|^) ~ (lGi(fc„)l^^)Afc„ = ( |g^(^")l' )Afc. (30) 
and, as L — + oo so that Afc„ 

(|A5L(fc„)|') ~ P{k„)Ak„ ~ AO(fc„), (31) 
where 

Q{k) = /" P(g)dg (32) 

— oc 

is the integrated spectrum of 5{x). 

The equation ( psj ) provides a spectral representation for Sl{x) for all a; but which is only valid for S{x) in the 
range —L < x < +L. To get a representation that is valid for all x we would like to let i ^ cxa in (^5|), but GL{k„) 
does not converge to a finite value as L — > oo. However, GL(kn)Akn is well-behaved so we can write, in the range 
-L<x< +L, 

S{x) = 5l{x) ^ ^ A5L{kn)ex.p{ik„x). (33) 

n— — OO 

Now is the time to let L ^ co so that Akn 0. The sum on the right-hand-side then converges in the required 
"mean-square" sense to a Stieltjes integral: 

/oo 
dS{k)exp{ikx), (34) 
- oo 

where 

d5{k) = lim A<5i,(fc„). (35) 

L — ►oo 

The limit of equation (^l|) then becomes 

{\d5{k)f) = dQ{k) = P{k)dk. (36) 

But note that dS{k) is not the Fourier transform of S{x); that role is more properly played by a term of the form 
dS{k)/dk. Following these considerations we can write 



<5(fci) - 6{k2) = 2^j y — j 5{x)dx. (37) 
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The expression has important consequences for the phases of the Fourier components of the process S[x). 
First, note that {5{x)) — 0. This requires that {Ghik)) — which, in turn, means that {ASL{k)) — and this, in 
turn, means that {dSL{k)) = for all k. Now, taking the complex conjugate of equation ( p^ yields 



S*{x)^ / dS* (k) exp{-ikx) (38) 

J —oo 

and replacing x by x + r in equation (j34|) yields 

/oo 
dS{k)exp{ik{x + r)). (39) 
' OC 

Multiplying these two expressions together and taking the expectation value leads to 

poo poo 

{S''{x)S(x + r)) = / exp{ix[k' ~ k])exp{ik'r){dS*{k)dS{k)). (40) 



oc >^ — oc 



Since S{x) is assumed to be statistically homogeneous, i.e. its probability distribution and all moments are invariant 
with respect to translations in x, the left-hand-side of this equation has to be the autocovariance function of the 
process £,{r). The right-hand-side must therefore be a function of r only. The contribution to the double integral must 
therefore vanish when k ^ k' . In other words, 

{dS*{k)d5{k')) ^0 (41) 

if fc 7^ fc'. This tells us immediately that the phases of the increments of the process, arg(d(5(fc)) = 0fc must be random, 
in the sense that 

{expi{ct>k~(l>k')} =0 (42) 

for k ^ k' , since the above argument applies regardless of the form of dS*{k). The term correlation is usually taken to 
refer to the association expressed by expectation values like that on the left-hand-side of equation (^2|), i.e. second- 
order association. Statistical homogeneity thus requires that phases of the components of S{x) should be uncorrelated 
and is not something to do with whether the process 5{x) is Gaussian. Notice also that the expectation value in 
equation (42) is an ensemble average, and this constraint does not apply necessarily to an average over a finite piece 
of a single realisation. 

Of course the condition (^^ does not mean that all phases must be statistically independent; it is weaker than 
the requirement that the joint probability distributions of two phases be separable. Neither does it mean that no 
statistical quantity based on the 0^ recovered from a single realisation can be found that is sensitive to the non- 
Gaussianity of S{x). What it does show, however, is that one has to look further than methods based on second-order 
properties of the distribution of phases. This is the challenge we address in the next section. 



5 QUANTIFYING PHASE INFORMATION 
5.1 Preamble 

From the examples discussed in Section 3, one can see that the signature of non-linear mode coupling is that the 
phases of the Fourier modes on large scales couple in certain relationship, depending on the morphology of the density 
distribution. On smaller scales, however, the phases queue up in such a way that phase difference, or the discrete phase 
gradient Dk, converges. This suggests that it might be a good idea to construct a histogram of the phase gradients 
for each mode and use it as a test of phase correlations, interpreted in the wider sense. In a Gaussian random field, 
the phases are random, so the phase gradient will simply be the difference between two uniformly-distributed random 
variables and will itself be uniformly distributed between — tt and tt. For a non-Gaussian field, however, this might 
not be the case. The distribution of phase differences is therefore one possible way of using phase information as a 
diagnostic of non-Gaussianity. Notice that this histogram is more useful than the histogram of phases themselves, 
which would simply be the projection of a return map onto one of its axes: this would be uniform even in the 
highly structured distributions seen in the previous section. Although this particular approach is suggested by the 
convergence displayed in the simple examples shown above, there could of course be other forms of phase behaviour 
that it might also quantify. 

Taking a histogram from a set of data produces something more-or-less equivalent to a probability density 
distribution. We can exploit this idea to think of the information content of this probability density, by defining a 
quantity analogous to its entropy. At the start, however, we should stress that this should by no means be interpreted 
as a thermodynamic entropy, as it only relates to the pattern of phases and not to the whole spatial distribution of 
material in space. 
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5.2 Phase entropy 

We now define a quantity, the phase entropy, as 

S = -J2piiDk)ln[pi{Dk)]5ct>i, (43) 

— TV 

with 



= 1, (44) 



where pi{Dh)54>i is the probability of finding a value in the i-th bin. This definition takes into account changes in 
binning strategy for the construction of the histograms. If Scjji were infinitesimal, 

p(Dfc) In [p(Dk)] d4>. (45) 

If a data set consisting of Utot values of Dk is binned into a histogram of m bins, each of which has width 54> = 27r/m, 
the probability density for n events in the i-th bin is pi — n/ {ritot 5 (j>i). 

Now another virtue of Dt becomes apparent. The phase of each Fourier mode changes with the choice of origin, 
and so does the converging point of phase gradient. Any statistic we use should not depend on the choice of origin, 
or at least this should be easy to remove from the calculations. Changing the point of origin by a distance x changes 
the phase of the fc-th mode by an amount kx, so the effect on the individual phases depends on their wavenumber. 
But the phase gradient suffers a constant offset 

Dk = (j)k+i + (fc + l)x - (pk + kx = 4>k+i - ct>k + X = Dk + X, (46) 

which can be handled easily. Now take as an example a single density peak, located at a. The phase for the fc-th 
mode is — fca, so the phase gradient for all fc values of the Fourier decomposition of the density peak converges to the 
same value, 

Dk = ct>k+i - (j>k = -{k + l)a - {-ka) = -a. (47) 

The probability density distribution is the same for all values of a, even with a constant offset caused by 
different choice of origin. This approach to phase information weights all phases equally, and discards power spectral 
information. As shown in the previous sections, the way in which the Fourier phases queue up depends on the relative 
level of the most dominant density peak to the rest of the structure: the higher the dominant peak, the greater is the 
effect for large k. It is the 'queuing up' of phases of high-frequency modes that signifies the gravitational clustering 
towards the nonlinear regime, quite independently of the amplitude corresponding to each phase. The quantity Dk 
encodes information about the ratio of successive Fourier components: 

Sk+i ^ \Sk+i[ exp[j(^^^j _ ^^)] ^ AexpiiDk), (48) 

Ok \Sk\ 

where A is a real number, which is obviously neglected in our analysis. This is a potential weakness when it comes 
to applying these ideas in practice, because a mode with vanishing amplitude still has a phase which receives equal 
weight in Dk with those of much higher amplitude. In real applications, therefore one would probably want to combine 
amplitude and phase information in some way, perhaps by removing modes of very low amplitude from the analysis. 
We return to this in Section 7. For the rest of this paper we shall neglect this problem and look at phase-only 
information. 

The idea of phase entropy was suggested by Polygiannakis & Moussas (1995). It comes directly from Boltzmann's 
definition of entropy in thermodynamics. In an ensemble of v identical replicas, the statistical weight Qi, of its system 
when z/i are in state 1, 1^2 in state 2,. . ., and Vr in state r is 

= —r-^ — -.■ (49) 

The entropy of the ensemble is thus Si, ~ InfJ,^. If v is sufficiently large, Vr also become large, so In fJ^ ~ 1/ In 1/ — 
i^r In z/r. Taking pr — Vrjv, the probability of state r, we roach 5' ~ — Pr Inpr- In the analogy to the definition 
of phase entropy, r microstates correspond to the bins in a histogram, which are between — tt and vr. Following the work 
of Shannon (1948a,b) this definition of entropy S is closely related to the information content I of the distribution: 
S(jy) = —I{D). We do not wish to push the connection with Boltzmann entropy too far, however, and in any case 
the definition of phase entropy is self-contained. It is not necessary to think about an approximation scheme of Qv, 
i.e., requiring u to be very large. Nevertheless, it is still required to have large enough of phases so as to produce a 
smooth probability density distribution. In some cases, where the probability density distributions are highly skew 
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and therefore some bins in the histogram might have zero number density, the contribution to phase entropy from 
these bins is zero, as hmp,,_o Pr Inp,, = 0. 

For a Gaussian field, where the number of Fourier modes is sufficiently large, and the phases of each of these 
modes are random, the phase entropy is 

S^-^P^ lnp» 5<l,, = - f p{Dk) In [p{Dk)] d<l> = - f i- In (i-) d</. = In 27r. (50) 

It can be proven, by straightforward analogy with the isoperimetric problem of variational calculus, that phase entropy 
has its maximal value when the probability density function p[-Dfc] = 1/2'k is constant. The Gaussian process, wherein 
the distribution of phases (and phase differences) is uniform, corresponds to a state of maximum entropy, and the 
value of S = In 2tt is the maximum value that S can take in any situation. 

When an initially-Gaussian field evolves under gravitational instability, the phases 'queue up' in a certain fashion, 
and the phase gradient is no longer evenly distributed in the histogram. What happens then is that the entropy of the 
phases decreases or, in other words, the information content increases. This does not mean that the process violates 
the second law of thermodynamics, because information moves between the amplitudes and phases (and indeed, 
between the density and velocity parts of phase-space) as the system evolves. There is more information in the whole 
system than is contained only in the phases, but the information content of these phases does certainly increase under 
the action of gravitational clustering. 

Although our application of this idea is largely inspired by Polygiannakis & Moussas (1995), who applied it to the 
analysis of time- varying quantities in Solar Wind data, they did not take into account the fact that the straightforward 
plogp entropy of the phase histogram itself is not invariant under translations of the origin. In our application, we 
use S{D) rather than 5(0) to counter this problem, as above. 



6 NUMERICAL SIMULATIONS 

In order to study the behaviour of S in systems with more complex initial data, we have conducted a series of 
numerical N-body experiments. In order to keep these experiments as simple as possible and to obtain the best 
possible numerical resolution with the computer resources available to us, we have used two-dimensional simulations 
of 512^ particles on a 512^^ mesh (like those described in Beacom et al. 1991). 

Four different initial pure power law spectra P{k) oc k" are modelled (with n = 2, 1, 0, and —1) for each of which 
the initial phases are randomised using a random number generator. To see the systematic effects from different power 
spectral indices and exclude the influence from phases, we set the same phases for the same modes for each of the 4 
spectral indices. The boundary of each realisation is periodic. The simulations are equivalent to 3D Einstein-De Sitter 
{Q, = 1) universe with 2D perturbations, whose images are cross sections of 3D perturbations. The spectral index n 
in 2D is equivalent to n — 1 in 3D in statistical measure. We also label the evolutionary stages of the simulations 
according to fejvi where 



{{Sp/pf)k^, = fo+(t) / Pik) d'k = 1. (51) 



Here 6+(t) is the growing mode in (^) of the linear density contrast, and P{k) is the linear extrapolation of the initial 
power spectrum. This definition of kNL identifles the corresponding 27rfe^^ as the boundary between linearity and 
non-linearity. From this criterion and P{k) oc fc", k''^^ oc 6!j^^(i) oc (1 -|- z)^. We set the flnal stage of the simulations 
the final stage to have kNL ~ 2, and pick 8 stages with values of kNL varying by a factor of two in each one; the first 
stage has kNL = fcNyquist = 256. 

Before proceeding, it is worth illustrating the points we made in the introduction about the importance of phase 
information using these simulations. 

To extract a measure of the phase entropy from these two-dimensional N-body simulations, we simply take the 
probability density distribution of all phase gradients in kx direction, for Sx, and those in ky direction, for Sy, then 
we take the arithmetic mean, because of the additive nature of the definition. To express in a clearer way, we can, 
for example, write 

m 

5. = -^p(DfeJln[p(DfeJ] 5<^,, (52) 

i=l 

where 

D,^=cl>{i + l,j)-<t>ii,j) (53) 
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Figure 5. Phase entropy of the gravitational evolution from different initial power spectral indices is shown as a function of 
the rms density fluctuations. For each spectral index, N-body simulation for 5 initial conditions of different random phases are 
conducted. Phase entropy, at its maximal value ln27r at the start, decreases when the system evolves. The different spectral 
indices display different characteristic curves of this decrease in phase entropy. 



for 1 < i,j < 256, the upper limit being the Nyquist frequency of the simulations. The number of bins m is chosen 
according to the number of phases available, to ensure the probability density distribution is reasonably smooth. For 
a realisation of 512^ particles on a 512^ mesh, there are 256'^ phases. The phase gradient D{kx) for a certain j is the 
phase gradient for the corresponding strip in the x-direction of the density distribution in real space. By taking all 
the phase gradients in one direction, binning them into a histogram, we are scanning every strip and examining the 
morphology of the realisation in that direction. 

Phases not only suffer an offset at the choice of origin in real space, but also depend upon orientation. In reality 
there should not be any preferred orientation in which the phase entropy is calculated. If a large filament lies diagonally 
across a realisation, it intercepts with every strip of the realisation at different positions. The converging points of the 
phase gradient for every strip cover the interval [— tt, +tt] uniformly, and each contributes the same amount of number 
density to every bin in the histogram. This results in a large phase entropy. However, if we rotate the realisation 
so that the strip lies across the scanning direction, a lower value of S will be obtained. Here in Figure 5 all of the 
realisations are rotated to the orientation along which the phase entropy is at its minimal value. 

We perform 5 sets of different initial phases, so there are 40 realisations for each spectral index. In Figure 5, we 
plot the phase entropy against the variance of the density field in the realisation for all power spectra. (The variance 
is plotted by smoothing the density field onto the the grid using a box-counting algorithm). 

It is immediately clear that the phase entropy decreases as the density variance increases, in other words as the 
system evolves. This is in full accord with our intuition described above, and confirms that S is a potentially useful 
diagnostic statistic. Furthermore, the different spectral indices display different characteristic curves of this decrease 
in phase entropy. The trends can be explained qualitatively in terms of the known morphology of clustering generated 
by the different sots of initial conditions. The spectrum with n — —1 contains a relatively large amount of power in 
long-wavelength modes. The evolution of a system with this initial spectrum is characterized by the early generation 
of long filaments where the caustics form; sheets form in 3D, but these are not relevant to the 2D simulations used 
here. In this case we can always rotate the realisation so that the longest filament, which has the largest effect on S 
is parallel (or perpendicular) to the orientation along which the Fourier transform is performed. The phase entropy 
decreases on the formation of large filaments of size similar to the simulation box, which happens even when the 
variance is small. As n decreases so does the amount of large-scale power. In order to produce the large filaments 
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that result in a decrease of S, the system has to evolve further on small scales. In the most extreme case of n = 2, 
this results in a very high value for the variance when S starts to display phase structure. 

6.1 Scaling and Self-similarity 

Before further application of phase entropy, we have to address some important features of the simulations from initial 
pure power law spectra. The initial pure power law spectra possess no characteristic length scales and also evolves 
in a self-similar manner, which means that at different stages, the distribution should possess the same . For a self- 
similar physical system, time dependence can be singled out by rcscaling x, the spatial position, or p, the momentum. 
Therefore a distribution function /(x, p, t) has the same statistical measure as the rescaled one, A"/(x/A'', p/A", At), 
as t — > At, where a, /?, and v are to be decided (Jain & Bertschinger 1996). Self-similar gravitational clustering is an 
idealisation and can not be applied in detail to a realistic model. It nevertheless provides useful insights into more 
realistic situations. 

Since the phases of Fourier modes play an important role in recovering the morphology of a density distribution 
(Oppenheim & Lim 1981), it is interesting to see if there is any self-similar scaling of the phases in gravitational 
evolution. As well as the 512^ N-body simulations discussed in the previous section, we also conduct 2048^ simulations 
with initial pure power law spectra, n = 2, 1, and 0. With fejvz, = 2 as the final stage (0 = 0), there are 10 stages, 
denoted stage a, b . . .j. To distinguish large- and small-scale phases, a realisation of a 2048^^ simulation box is divided 
into 2^ ( scale length k = 2048/2 = 1024), 4^ {h = 512), 8^ (4 = 256), 16^ {h = 128), etc. boxes (sub-realisations). 
On each scale length Is, we calculate the phase entropy for each sub- realisation, and then take average of them over 
that scale. By doing so, we are averaging phase entropy for the same scale length through many sub-realisations. The 
lower limit of the scale length of sub-realisations chosen is the one which still has sufficient large number of phases. 
Here we choose as the smallest scale length Is = 64, by dividing a 2048^ realisation into 1024 sub-realisations of 
a 64'^ mesh. Each 64'^ mesh offers merely 32^ phases binned into a histogram of 40 bins. Therefore, for each 2048'^ 
realisation, the phase entropy appears on 5 different scale lengths, and, from large to small Is , more and more phases 
from large scales are excluded from the calculation. It is expected that phase entropy decreases through large to 
small scales, as structure on small scales goes nonlinear while that on large scales is still in the linear regime. To 
simplify the calculations and for completeness, the sub-realisations are not rotated to minimise the phase entropy in 
this case. Taking Is = 512 as an example, we divide a realisation of a 2048^ simulation box into (2048/512)^ = 16 
sub-realisations, each of which is a 512^^ mesh, and calculate the phase entropy for each sub-realisation, before taking 
average of the phase entropy of these 16 sub-realisations. Structure scales larger than 512, 1/4 of the side of the whole 
simulation box, are left out, what is included in the calculation being the structure up to the scale length Is. 

In Figure 6, the phase entropy is shown on 5 scale lengths for stage / to j. The phase entropy decreases from 
large to small Is for all three spectral indices. For stage g of spectral index n = 2, the phase entropy decreases 
drastically from scale length Is — 128, to Is = 64, which can be viewed as follows. The stages of simulations are 
chosen to indicate the boundary between linearity and nonlinearity and that there is factor of two between any 
two adjacent stages. For stage g (kNL = 16), the boundary is on a box size of (2048/16)^ = 128^ meaning that 
structure within a 128^ box placed anywhere in the simulation box has gone nonlinear in a statistical sense. Thus 
the transition from linearity to nonlinearity across the boundary is shown clearly via the large decrease of the phase 
entropy from length scale Is = 128 to Is — 64. The same situation applies to stage h from Is = 256 to h = 128, where 
Is = 256 corresponds to the boundary size of nonlinearity; so does stage i from Is = 512 to 256, and stage j from 
Is = 1024 to 512. What is more interesting in Figure 6 is that phase entropy seems to stay at the same level for the 
same degree of nonlinearity in the plot. Stage g at scale length Is = 64 possesses the same level of phase entropy 
as stage h at Is = 128, so does stage i at Is = 256, and stage j at h = 512. If fejvL is used to denote the stages, 
S{kNL,ls) = 5(2,512) ~ 5(4,256) ~ 5(8,128) ~ 5(16,64). This result is not surprising at all. As fcjvi are chosen 
twofold for two adjacent stages in the self-similar evolution, the distribution of stage m looks the same statistically 
as stage m + 1 when blown up by a factor of two. For any scale length Is of stage m, it should retain the same level of 
phase entropy as the next stage m-l- 1 at double scale length, 2Zs. Therefore, the phase entropy is indeed an indicator 
of the level of nonlinearity (or linearity), and demonstrates the roughly self-similar nature of the evolution, which 
can be expressed as 

S{kNL,ls) = S{2kNL,ls/2), (54) 

n+2 

as kNL — » 2kNL, or equivalently, (1 + ^;)— »-2 2 z). All the stages can be scaled by phase entropy into one single 
characteristic curve, S(kNLls)- 

This simple picture is complicated by a number of factors which produce a discernible slope in Figure 6, which 
displays a perceptible tilt toward small Is ■ This disagreement between points of the same level of nonlinearity may 
be due to the fact that 5 does not scale in a self-similar fashion. But before this can be concluded we have to 
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Figure 6. The phase entropy is shown for five length scales Is = 64, 128, 256, 512 and 1024. A denotes stage /, o denotes stage 
g, X stage h, □ stage i, and * stage j. Clearly, phase entropy decreases towards small length-scale as expected. For the same 
stage of n > 0, the transition in scale from linearity to nonlinearity can be seen from the large decrease in phase entropy by 

decreasing Is, e.g., for stage g from Is = 128 to 64, stage h from Is = 256 to 128, stage i from Is = 512 to 256, and stage j from 
Is = 1024 to 512. Also phase entropy seems an indicator to nonlinearity of gravitational clustering. The amount of decrease in 
phase entropy is roughly the same for the transition from linearity to nonlinearity for every stage, which implies self-similar 
nature of gravitational clustering for pure power law spectra. 



contend with several complicating factors. First, for large Is, there are fewer sub-realisations to be averaged, and 
the sub-realisations in this experiment are not rotated to obtain the value of the phase entropy. This is expected to 
artificially increase the phase entropy on large-scales compared to small. This is probably more pronounced for small 
n where filaments dominate. For small h the averaging of more sub-realisations erases this effect. Using larger bins 
to compensate for the smaller rmmber of phases available also lowers the phase entropy. We therefore see points at 
the same level of phase entropy tilt towards small length scales. Another numerical effect comes in the when both 
kNL and h are small. In forming clusters, galaxies aggregate at the expense of forming voids. When the evolution 
goes into the highly nonlinear regime, voids dominate the realisation and many of the sub-realisations become empty. 
This produces an artificial contribution to the entropy determined partially by the bin size. 

But while the exact form of scaling remains to be determined by eliminating these possible systematics and 
studying the scaling of S for a wider range of spectral indices, this study at least shows that phase entropy has 
qualitatively the right properties for a diagnostic of gravitational evolution. 



7 DISCUSSION AND CONCLUSIONS 

The aim of this paper was to explore the behaviour of phases of Fourier components of cosmologieal density fluctu- 
ations in order to understand how phase information relates to the development of clustering pattern from random- 
phase (Gaussian) initial conditions. 

We have studied some simple cases of the evolution of Fourier phases under the action of gravitational instability 
and used these cases to motivate the definition of a measure of the information contained in these phases. The 
statistic we propose, S, which behaves as a kind of information entropy, shows interesting time-evolution and appears 
to undergo a form of scaling when the initial conditions on which gravity acts are self-similar. 

Although this statistic looks like a promising candidate as a diagnostic for phase coupling, there are many 
problems to be overcome before it can be applied to real data. For example, it remains to be seen how S can 
be corrected for shot-noise (discreteness) errors and also how to correct for the imposition of a selection function. 
Redshift-space effects also need to be taken into account. These are inevitably more difficult to handle for a higher- 
order statistic than with the correlation function or the power-spectrum. There is also the problem, alluded to in 
Section 5, that even Fourier modes with negligible amplitudes have a phase, and our statistic does not take account 
at all of amplitude information. In a practical application one would probably therefore prefer to combine amplitude 
information with phase information. This is what is done to some extent with the bispectrum (Matarrese et al. 
1997) and the second spectrum (Stirling &i Peacock 1996). But note that the bispectrum, for example, only contains 
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information about third-order moments of the Fourier components. It does not therefore include all the information 
that resides in the distribution of phases. Knowledge of all higher-order polyspectra would be necessary to define the 
total information content of the Fourier phases. For this reason it is certainly worth investigating statistics based 
explicitly on the phases, rather than indirectly on them as is done with the bispectrum or any other piecemeal 
approach higher-order moments. 

In future we hope to test the usefulness of phase statistics compared with more standard methods. In any case 
we stress the point that phase information is completely complementary to amplitude information: no information 
about localised geometrical structures is contained in the amplitudes themselves, so phase information however it is 
encoded is vital to a full analysis of clustering pattern. 

As we discussed in the introduction, phase information per se has not been extensively used in galaxy clustering 
analyses so far. We hope to develop tools that use this information to provide sensitive tests of cosmological models. 
In the context of galajcy clustering, what we need to understand using further N-body experiments is how the growth 
of phase information relates to the growth of the power spectrum. It is known that gravitational instability acts 
on initially Gaussian perturbations to produce fluctuations that are non-Gaussian but which are characterised by 
higher-order moments, such as the skewness {S"^) that couple to the variance (5^) in a well-defined way (e.g. Coles 
& Frenk 1991; Bernardeau 1992). One of the possibilities we shall explore is how the growth of phase information 
itself couples to the growth of the Fourier amplitudes. If this coupling can be quantified, then one can hope to use 
it as a diagnostic of clustering. For example, many clustering models appeal to the presence of a bias to enhance 
the clustering pattern over that generated by gravity (e.g. Coles 1993). A linear bias b affects the Fourier amplitudes 
but not their phases. One can therefore hope to distinguish a biased power spectrum from one generated solely 
by gravitational instability: for the same amplitude, the former should have a higher phase entropy (lower phase 
information) than the latter. By quantifying the phase information on very large scales we also hope to test initial 
non-Gaussianity, if the non-Gaussianity developed from initially Gaussian fluctuations can be understood. We shall 
return to these and related questions in future work, but the scaling described in Section 6 is grounds for optimism 
that this can be achieved. 

We also take this opportunity to point out that the phase entropy can in principle be applied not just to density 
fluctuations, but also as a diagnostic of non-Gaussianity in microwave background temperature fluctuations. With 
tentative claims already being made for the detection of non-Gaussian effects in the COBE data in an analysis based 
on bispectra (Ferreira, Magueijo & Gorski 1998), though not without opposition (Bromley & Tegmark 1999), the pre- 
Planck development of sensitive yet robust statistical indicators is a matter of urgent priority. Phases can be defined 
for a spherical harmonic expansion in a similar way to that of the Fourier modes and may provide more information 
about the form of non-Gaussian behaviour present in existing CMB maps as well as offering more powerful and 
sensitive descriptors of future data sets. 
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